Skip to content

SAI API Performance Monitoring#2279

Open
JaiOCP wants to merge 1 commit intoopencomputeproject:masterfrom
JaiOCP:perfmon
Open

SAI API Performance Monitoring#2279
JaiOCP wants to merge 1 commit intoopencomputeproject:masterfrom
JaiOCP:perfmon

Conversation

@JaiOCP
Copy link
Copy Markdown
Contributor

@JaiOCP JaiOCP commented Apr 21, 2026

This PR brings in support for measuring SAI API performance. This is based on presentation done in OCP 2023.

Signed-off-by: JaiOCP <jai.kumar@broadcom.com>
Copy link
Copy Markdown
Contributor

@rck-innovium rck-innovium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While most of the measurements can be done at the application level, this proposal provides a way to measure the metrics per object operation inside bulk APIs which cannot be done by application level performance monitoring.

* @type sai_uint64_t
* @flags READ_ONLY
*/
SAI_PERFMON_ATTR_PERFDATA,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify the units of this data.

sai_attr_list[2].value.s32 = SAI_PERFMON_METRICS_AVERAGE_LATENCY;

// Configure Time Interval in msec
sai_attr_list[3].id = SAI_PERFMON_ATTR_METRICS_TIME_INTERVAL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAI_PERFMON_ATTR_METRICS_TIME_INTERVAL attribute is missing, and functionality is NOT specified. The spec has defined that the interval is always between the two invocations for a given ObjType+API_type.

@rck-innovium
Copy link
Copy Markdown
Contributor

As discussed, the community concluded that we should not preserve this perfmon data across warmboot (especially since we thought it does not make sense for warm upgrades/ downgrades)

Comment thread inc/saiswitch.h
* @objects SAI_OBJECT_TYPE_PERFMON
* @default empty
*/
SAI_SWITCH_ATTR_PERFMON_LIST,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on the call, can rely on object-creation to start the collection.

If so, change this to read-only as that allows enumerating the objects, and the metadata will enforce that.

These metrics can be used to:
- Improve SAI adapter and SDK implementations
- Provide a baseline for comparing different hardware
- Instantaneous value: Provides [time, n], where n > 1 represents the number of objects in a bulk API, or n = 1 represents the last observed latency for a single object
Copy link
Copy Markdown
Contributor

@j-bos j-bos Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 'n' part is not kept any more. Can simplify the description to "last observed latency for API call".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants